{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2Setup\n",
    "To run this notebook, you'll need (do these in order):\n",
    "\n",
    "The suitesparse library:\n",
    "\n",
    "From conda:\n",
    "\n",
    "conda install suitesparse\n",
    "\n",
    "On ubuntu/similar linux\n",
    "\n",
    "apt-get install suitesparse\n",
    "\n",
    "My version of the PySPQR repository (This is where you need suitesparse)\n",
    "\n",
    "https://www.github.com/smithb/PySPQR.git\n",
    "\n",
    "My LSsurf repository\n",
    "\n",
    "https://www.github.com/smithb/LSsurf.git\n",
    "\n",
    "My pointCollection repository:\n",
    "\n",
    "https://www.github.com/smithb/pointCollection.git\n",
    "\n",
    "For each repository, you'll need to clone the repo (git clone [url to .git file]), then cd to the \n",
    "\n",
    "directory that git makes, and type:\n",
    "\n",
    "python3 setup.py install --user \n",
    "\n",
    "Good luck!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from LSsurf.smooth_xytb_fit import smooth_xytb_fit\n",
    "import pointCollection as pc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "The ATL14/15 algorithm works by fitting a time-varying surface to the data.  The form of the model is:\n",
    "$$\n",
    "z_m(x, y, t) = z_0(x,y) + \\delta z(x, y, t)\n",
    "$$\n",
    "Here $z_0$ is a DEM giving the surface height at time $t_0$, and $dz(x,y,t)$ gives the surface-height change between $t_0$ and $t$ at location $x,y$.  The DEM is represented as a high-resolution grid of elevations, while $dz$ is represented as a set of lower-resolution surfaces, one for each quarter-year interval.  The model is constructed so that the $z_0$ surface for time $t_0$ is uniformally equal to zero.\n",
    "\n",
    "We find the surface by minimizing the quantity:\n",
    "$$\n",
    "R = W_{xx0}^2 \\int (\\nabla^2 z_0)^2 dA + W_{x0} \\int (\\nabla z_0)^2 dA + W_{xxt}\\int (\\nabla^2 \\frac{\\partial\\delta z}{\\partial  t})^2 dAdt + W_{xt}\\int (\\nabla \\frac{\\partial\\delta z}{\\partial  t})^2 dAdt + W_{tt}\\int (\\frac{\\partial^2 \\delta z}{  \\partial t^2})^2 dA + \\sum (\\frac{z_m(x,y,t)-z_i(x,y,t)}{\\sigma_i})^2\n",
    "$$\n",
    "Here $W_{xx0}$ is the inverse of the expected RMS of the second spatial derivatives of the surface height, $W_{x0}$ is the the inverse of the expected RMS of the first derivatives of the surface height, $W_{xxt}$ is the the inverse of the expected RMS of the second spatial derivatives of the $dz/dt$ field, etc.  The last term is the sum of the error-scaled residuals between the data and the model.  I've put some mathematical description of how this model behaves in the attenuation_curves.ipynb notebook in this repo's directory.\n",
    "\n",
    "To construct a surface, we need to specify the data values, the model grid resolutions for the DEM and for the height-change surfaces, the dimensions of the grid, and the expected derivative values.  \n",
    "\n",
    "## Solutions in one dimension (x)\n",
    "Initially, we will demonstrate the fit on a long, skinny domain, to illustrate how the model works in one dimension.  We will specify identical values for the data for two different time epochs, so that there is no time variation in the solution, and all variation in the solution is in the DEM ($z_0$) field."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the domain's width in x, y, and time\n",
    "W={'x':1.e4,'y':200,'t':2}\n",
    "# define the grid center:\n",
    "ctr={'x':0., 'y':0., 't':0.}\n",
    "# define the grid spacing\n",
    "spacing={'z0':50, 'dz':50, 'dt':0.25}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the data as a sine wave with a wavelength of 2 km and an amplitude of 100m.  \n",
    "x=np.arange(-W['x']/2, W['x']/2, 100)\n",
    "lambda_x=2000\n",
    "amp=100\n",
    "data_sigma=1\n",
    "D=pc.data().from_dict({'x':x, 'y':np.zeros_like(x),'z':-amp*np.cos(2*np.pi*x/lambda_x),\\\n",
    "                       'time':np.zeros_like(x)-0.5, 'sigma':np.zeros_like(x)+data_sigma})\n",
    "# To ensure a time-constant simulation, replicate the data at times -0.5 and 0.5:\n",
    "data=pc.data().from_list([D, D.copy().assign({'time':np.zeros_like(x)+0.5})])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the expected statistics of the surface\n",
    "\n",
    "E_d3zdx2dt=0.0001\n",
    "E_d2z0dx2=0.06\n",
    "E_d2zdt2=5000\n",
    "\n",
    "data_gap_scale=2500\n",
    "E_RMS={'d2z0_dx2':E_d2z0dx2, 'dz0_dx':E_d2z0dx2*data_gap_scale, 'd3z_dx2dt':E_d3zdx2dt, 'd2z_dxdt':E_d3zdx2dt*data_gap_scale,  'd2z_dt2':E_d2zdt2}\n",
    "\n",
    "# run the fit\n",
    "S=smooth_xytb_fit(data=data, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                 reference_epoch=2, N_subset=None, compute_E=False,\n",
    "                 max_iterations=1,\n",
    "                 VERBOSE=False, dzdt_lags=[1])\n",
    "\n",
    "# plot the results\n",
    "plt.figure()\n",
    "plt.clf()\n",
    "plt.plot(data.x, data.z,'ko', label='data')\n",
    "plt.plot(S['m']['z0'].x, S['m']['z0'].z0[0,:],'r', label='model')\n",
    "plt.legend();\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reducing the expected derivatives results in a smoother surface that does not fit the data as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "plt.clf()\n",
    "plt.plot(data.x, data.z,'ko', label='data')\n",
    "A_z0={}\n",
    "A_expected={}\n",
    "\n",
    "# data density\n",
    "rd = data.size/W['x']/W['y']\n",
    "\n",
    "for E_d2z in [0.006, 0.001, 0.0003]:\n",
    "    E_RMS['d2z0_dx2'] = E_d2z\n",
    "    # run the fit\n",
    "    S=smooth_xytb_fit(data=data, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=2, N_subset=None, compute_E=False,\n",
    "                     max_iterations=1,\n",
    "                     VERBOSE=False, dzdt_lags=[1])\n",
    "    \n",
    "    plt.plot(S['m']['z0'].x, S['m']['z0'].z0[0,:], label=f'E_d2z0_dx2={E_d2z}')\n",
    "    # calculate the amplitude\n",
    "    A_z0[E_d2z]=np.max(np.abs(S['m']['z0'].z0[0,np.abs(S['m']['z0'].x)<3000]))\n",
    "    # Calculate the expected amplitude\n",
    "    A_expected[E_d2z] = amp /( 1 + 16*E_d2z**-2*np.pi**4/(lambda_x**4*rd)*data_sigma**2)                                 \n",
    "                                              \n",
    "plt.xlabel('x, m')\n",
    "plt.ylabel('z0, m')\n",
    "plt.legend(loc='lower right')\n",
    "E_RMS['d2z0_dx2']= 0.006\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The expected amplitude for a model fit to data representing a sine wave of amplitude $A_d$ with wavelength $\\lambda$ and estimated error $\\sigma$ is:\n",
    "\n",
    "$$A_m = \\frac{A_{d}}{1 + \\frac{16 \\pi^{4}  \\sigma^{2} }{E_{xx}^2\\lambda^{4} \\rho}}$$\n",
    "\n",
    "Here $E_{xx}$ is the expected second spatial derivative of $z0$.  A plot of the recovered model amplitude vs. $E_{xx}$ is consistent with this model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Ezz_vals = np.arange(0.0002, 0.007, 0.0001)\n",
    "A_expected = amp /( 1 + 16*np.pi**4*data_sigma**2/(Ezz_vals**2*lambda_x**4*rd)) \n",
    "\n",
    "plt.figure(); plt.clf()\n",
    "plt.plot(Ezz_vals, A_expected, label='expected')\n",
    "temp=np.array(list(A_z0.keys()))\n",
    "plt.plot(temp, np.array([A_z0[key] for key in A_z0.keys()]),'o', label='recovered')\n",
    "plt.xlabel('$E_{xx}$')\n",
    "plt.ylabel('amplitude')\n",
    "plt.legend()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The values recovered from the fit are within a few percent of those expected based on the analytic expression."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fitting data with gaps\n",
    "If there is a gap in the data, the solution will tend to form a smooth arc over the gap.  The smaller the expected first derivative of the data, the more the solution will flaten out across the gap.  For solutions of this type, the solution will tend to go flat over a distance $data\\_gap\\_scale$ if $E[RMS(dz/dx)] = E[RMS(d^2z/dx^2)]*data\\_gap\\_scale$\n",
    "\n",
    "As you might imagine, we try to set _data\\_gap\\_scale_ to be about half as large as we expect gaps in the data to be, so that large extrapolations don't produce odd values in the solution.  We can demonstrate how the solution behaves in over data gaps by deleting the central 3 km of the data from our previous example, and fitting the model to the remaining data with different values of $data\\_gap\\_scale$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#make a set of data with a gap\n",
    "data_with_gap=data[np.abs(data.x)>1500]\n",
    "\n",
    "\n",
    "plt.figure()\n",
    "plt.clf()\n",
    "plt.plot(data_with_gap.x, data_with_gap.z,'ko', label='data')\n",
    "# run the solution with different data gap scales\n",
    "for this_data_gap_scale in [4000, 2000, 1000, 500, 250]:\n",
    "    E_RMS['dz0_dx'] = E_RMS['d2z0_dx2']*this_data_gap_scale\n",
    "    S=smooth_xytb_fit(data=data_with_gap, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=2, N_subset=None, compute_E=False,\n",
    "                     max_iterations=1,\n",
    "                     VERBOSE=False, dzdt_lags=[1])\n",
    "    \n",
    "    plt.plot(S['m']['z0'].x, S['m']['z0'].z0[0,:], label=f'data gap scale={this_data_gap_scale}')\n",
    "plt.legend(loc='lower right');\n",
    "plt.xlabel('x'); plt.ylabel('h')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The smaller data_gap_scale values result in a flatter solution inside the gap, at the expense of larger misfits.  Excessively small data_gap_scale values are also undesirable because they can introduce artificial flattening in smooth ice-sheet regions with consistent surface slopes.  We set data_gap_scale to 1500 m (equal to half the ICESat-2 pair-to-pair spacing), which seems to produce clean results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solutions in one dimension and time (x, t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's see what happens when the solution can vary in space and time.  We'll specify a flat surface for t=-0.99 (just after the start of the solution) and a sinusoidal surface for t=0.99 (just before the end).  We will specify that the DEM is for reference epoch 4 (t=0)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "D0=pc.data().from_dict({'x':x, 'y':np.zeros_like(x),'z':np.zeros_like(x),\\\n",
    "                       'time':np.zeros_like(x)-0.99, 'sigma':np.zeros_like(x)+1})\n",
    "D1=pc.data().from_dict({'x':x, 'y':np.zeros_like(x),'z':amp-amp*np.cos(2*np.pi*x/lambda_x),\\\n",
    "                       'time':np.zeros_like(x)+0.99, 'sigma':np.zeros_like(x)+1})\n",
    "data_dt=pc.data().from_list([D0, D1])\n",
    "\n",
    "data_gap_scale=2500\n",
    "E_RMS['d3z_dx2dt'] = 0.006\n",
    "E_RMS['d2z_dxdt'] = 0.006*data_gap_scale\n",
    "E_RMS['d2z0_dx2'] = 0.03\n",
    "E_RMS['dz0_dx'] = 0.03*data_gap_scale\n",
    "\n",
    "S=smooth_xytb_fit(data=data_dt, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=1,\n",
    "                     VERBOSE=False, dzdt_lags=[1])\n",
    "\n",
    "plt.figure();plt.clf()\n",
    "plt.plot(D0.x, D0.z,'x', color='gray', label='data, t=-0.99')\n",
    "plt.plot(D1.x, D1.z,'ko', label='data, t=0.99')\n",
    "\n",
    "for epoch in range(S['m']['dz'].shape[2]):\n",
    "    this_time=S['m']['dz'].time[epoch]\n",
    "    plt.plot(S['m']['dz'].x, S['m']['z0'].z0[2,:]+S['m']['dz'].dz[2,:, epoch], label=f'$\\delta z$, t={this_time}')\n",
    "plt.plot(S['m']['z0'].x, S['m']['z0'].z0[2,:],'k', label='z0', linewidth=2)\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('z')\n",
    "plt.legend();\n",
    "\n",
    "\n",
    "plt.figure(); plt.clf()\n",
    "\n",
    "for epoch in range(S['m']['dz'].shape[2]):\n",
    "    this_time=S['m']['dz'].time[epoch]\n",
    "    plt.plot(S['m']['dz'].x, S['m']['dz'].dz[2,:, epoch], label=f'$\\delta z$, t={this_time}')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('$\\delta$ z')\n",
    "plt.legend();\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The recovered surface matches the data at t=-1 and t=1, and varies smoothly in between. Because the reference epoch is halfway between the two data sets, its value is halfway between flat surface (at $t \\approx -1$) and the raised sinusoid (at $t \\approx 1$). The $\\delta z$ fields smoothly so that at any point, the surface varies approximately linearly from the $t=-1$ to its $t=1$ solution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(6); \n",
    "# find a point close to x=1000 \n",
    "ii=np.argmin(np.abs(S['m']['dz'].x-1000))\n",
    "# plot the time series\n",
    "plt.plot(S['m']['dz'].time, S['m']['dz'].dz[2, ii, :])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we add another time point at t=-0.25 that is not colinear (in time) with the other time points, we get a smooth time variation at each point:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "D2 = pc.data().from_dict({'x':x, 'y':np.zeros_like(x),'z':-1*amp+np.zeros_like(x),\\\n",
    "                       'time':np.zeros_like(x)-0.25, 'sigma':np.zeros_like(x)+1})\n",
    "\n",
    "data_dt2=pc.data().from_list([D0, D1, D2])\n",
    "\n",
    "\n",
    "S=smooth_xytb_fit(data=data_dt2, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=1,\n",
    "                     VERBOSE=False, dzdt_lags=[1])\n",
    "plt.figure(7); \n",
    "# Find the data points closest x=1000\n",
    "di=np.where(np.abs(data_dt2.x-1000)<2)\n",
    "plt.plot(data_dt2.time[di], data_dt2.z[di],'o', label='data for x=1000')\n",
    "\n",
    "# find a model point close to x=1000 \n",
    "ii=np.argmin(np.abs(S['m']['dz'].x-1000))\n",
    "# plot the recovered time series\n",
    "plt.plot(S['m']['dz'].time, S['m']['dz'].dz[2, ii, :], label='dz for x=1000')\n",
    "plt.plot(S['m']['dz'].time, S['m']['dz'].dz[2, ii, :] + S['m']['z0'].z0[2,ii], label='dz+z0 for x=1000')\n",
    "plt.legend();\n",
    "plt.xlabel('time')\n",
    "plt.ylabel('h')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Editing outliers\n",
    "Outliers are identified based on the distribution of scaled residuals in the data.  We iterate the solution and at each iteration calculate a robust estimate of the standard deviation of residuals in the data ($\\hat{\\sigma}$), then remove the outliers that are larger than $3\\hat{\\sigma}$.  Let's return to the example with two data epochs, add some noise to all the data, and large noise values to a subset of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dte=pc.data().from_list([D0, D1])\n",
    "data_dte.z += np.random.randn(data_dte.size)\n",
    "# introduce about 10% large outliers\n",
    "outliers = np.argwhere(np.random.rand(data_dte.size) > 0.90).ravel()\n",
    "data_dte.z[outliers] += (np.random.rand(outliers.size)-0.5)*200\n",
    "\n",
    "plt.figure(8)\n",
    "plt.plot(data_dte.x, data_dte.z,'.', label='data')\n",
    "plt.plot(data_dte.x[outliers], data_dte.z[outliers],'r*', label='outliers')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see what these outliers do to the solution without editing by running the solution with only one iteration.  Note that we have asked for verbose output from _smooth\\_xytb\\_fit_, which reports on the robust spread and the outliers from each iteration.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "S=smooth_xytb_fit(data=data_dte, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=1,\n",
    "                     VERBOSE=True, dzdt_lags=[1])\n",
    "plt.figure(9, figsize=[9, 4]); plt.clf()\n",
    "plt.subplot(121)\n",
    "plt.plot(data_dte.x, data_dte.z,'k.', label='data')\n",
    "plt.plot(S['m']['dz'].x, S['m']['z0'].z0[2,:]+S['m']['dz'].dz[2, :, 0],'k', label='solution, t=-1')\n",
    "plt.plot(S['m']['dz'].x, S['m']['z0'].z0[2,:]+S['m']['dz'].dz[2, :, -1],'b', label='solution, t=1')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('h')\n",
    "\n",
    "plt.subplot(122)\n",
    "r=S['data'].z-S['data'].z_est\n",
    "plt.hist(r, np.arange(-101, 100, 2));\n",
    "plt.xlabel('residual')\n",
    "plt.ylabel('count')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the solution for more iterations eliminates a greater share of the outliers.  The acceptance (or non-acceptance) of the data points is stored in the _three\\_sigma\\_edit_ field of S['data']."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "S=smooth_xytb_fit(data=data_dte, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=10,\n",
    "                     VERBOSE=True, dzdt_lags=[1])\n",
    "plt.figure(9, figsize=[9, 4]); plt.clf()\n",
    "plt.subplot(121)\n",
    "d_out=S['data']\n",
    "plt.plot(d_out.x[d_out.three_sigma_edit==1], d_out.z[d_out.three_sigma_edit==1],'k.')\n",
    "plt.plot(d_out.x[d_out.three_sigma_edit==0], d_out.z[d_out.three_sigma_edit==0],'r.')\n",
    "plt.plot(S['m']['dz'].x, S['m']['z0'].z0[2,:]+S['m']['dz'].dz[2, :, 0],'k')\n",
    "plt.plot(S['m']['dz'].x, S['m']['z0'].z0[2,:]+S['m']['dz'].dz[2, :, -1],'b')\n",
    "\n",
    "plt.subplot(122)\n",
    "r=d_out.z-d_out.z_est\n",
    "plt.hist(r, np.arange(-101, 100, 2), color='red')\n",
    "plt.hist(r[d_out.three_sigma_edit==1], np.arange(-101, 100, 2), color='blue');\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This only works well if the constraints allow a fairly good fit between the data and the model.  If the best misfit allowed by the constraint is large, then the histogram of residuals is broad, and the distinction (in residual space) between outliers and datapoints is not as large."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data biases\n",
    "\n",
    "Let's make a new example, with several cycles of data on the same line.  Each will have a value that is displaced from a trend line to simulate correlated errors (biases) in the data.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define simulation parameters\n",
    "sigma_uncorr=0.1\n",
    "bias_mag=0.25\n",
    "#t_vals=np.arange(-0.99, 0.99+0.1, 0.1)\n",
    "t_vals=np.linspace(-.99, 0.99, 9)\n",
    "z_vals=0.5*t_vals\n",
    "# define the bias values\n",
    "bias_vals=np.random.randn(len(z_vals))*bias_mag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make dataset\n",
    "x1=np.arange(-W['x']/2, W['x']/2, 5)\n",
    "y1=np.zeros_like(x1)\n",
    "D_list=[]\n",
    "for cycle in range(len(t_vals)):\n",
    "    this_z = np.zeros_like(x1)+z_vals[cycle]+bias_vals[cycle] + np.random.rand(x1.size)*sigma_uncorr\n",
    "    D_list.append(\n",
    "        pc.data().from_dict({'x':x1,'y':y1,'time':np.zeros_like(x1)+t_vals[cycle], 'cycle':np.zeros_like(x1)+cycle,\\\n",
    "                             'z':this_z, 'sigma':np.zeros_like(x1)+sigma_uncorr,'sigma_corr':np.zeros_like(x1)+bias_mag})\n",
    "    )\n",
    "data_biased=pc.data().from_list(D_list)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we have specified a _sigma\\_corr_ value that describes the correlated error magnitude.  It doesn't get used until we tell the inversion to solve for errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit the biased data\n",
    "Sb=smooth_xytb_fit(data=data_biased, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=10,\n",
    "                     VERBOSE=False, dzdt_lags=[1])\n",
    "\n",
    "# plot the results:\n",
    "# find a row and a column in the center of the simulation\n",
    "r0_dz, c0_dz, e0_dz = np.round(np.array(Sb['m']['dz'].shape)/2).astype(int)\n",
    "r0_z0, c0_z0 = np.round(np.array(Sb['m']['z0'].shape)/2).astype(int)\n",
    "\n",
    "# plot the model for the center point\n",
    "plt.figure(10); \n",
    "plt.plot(Sb['m']['dz'].time, Sb['m']['dz'].dz[r0_dz, c0_dz, :]+Sb['m']['z0'].z0[r0_z0, c0_z0], label='biased model')\n",
    "plt.plot(t_vals, bias_vals+z_vals, '*', label='data')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is that the recovered delta-z signal matches the biased data well.  If we fit the data without taking into account the biases, the large number of data points with the same bias exert a strong influence on the model, so the smoothness constraints don't help much to suppress signals related to the biases.\n",
    "\n",
    "## Estimating data biases\n",
    "\n",
    "We can tell the inversion that there is a bias parameter using the _blas\\_params_ keyword, which takes a list of all parameters over which the bias is correlated: the solution will estimate one bias for each combination of unique values of the parameter in the list, and will set the expected RMS value for each parameter to the median of the correlated error ( _sigma\\_corr_ ) values for the data.  If there are four cycles in the data, there will be four bias estimates.  In solving the ATL14/15 problem, we set _bias\\_params_ to ['rgt','cycle'], so there is one bias estimated for each rgt and each cycle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Sbc=smooth_xytb_fit(data=data_biased, ctr=ctr, W=W, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=False,\n",
    "                     max_iterations=10,\n",
    "                     VERBOSE=False, dzdt_lags=[1], bias_params=['cycle'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(11); \n",
    "plt.plot(Sb['m']['dz'].time, Sb['m']['dz'].dz[r0_dz, c0_dz, :]+Sb['m']['z0'].z0[r0_z0, c0_z0], label='no bias estimate')\n",
    "plt.plot(Sbc['m']['dz'].time, Sbc['m']['dz'].dz[r0_dz, c0_dz, :]+Sbc['m']['z0'].z0[r0_z0, c0_z0], label='cycle bias estimated')\n",
    "plt.plot(t_vals, bias_vals+z_vals, '*', label='data')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Letting the inversion take into account the biases in the data results in a much smoother solution, although with larger error estimates, and less ability to recover short-term fluctuations in surface height"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Error estimates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm can optionally estimate errors for the _z0_ and _dz_ fields, and the biases.  These error estimates depend on:\n",
    "* the spatial and temporal distribution of the data,\n",
    "* the error magnitude estimates, and \n",
    "* the bias magnitude estimates.  \n",
    "\n",
    "Let's look at the time-dependent solution from earlier, but remove the central portion to make a data gap. We're making the solution a bit wider in the y direction to allow us a bit of space to play with the model resolution. Note that the solution takes much longer when we calculate errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dt_gap = data_dt[np.abs(data_dt.x) > 1000]\n",
    "W1={'x': 10000.0, 'y': 400, 't': 2}\n",
    "S=smooth_xytb_fit(data=data_dt_gap, ctr=ctr, W=W1, spacing=spacing, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=True,\n",
    "                     max_iterations=1, dzdt_lags=[1])\n",
    "\n",
    "# find a row and a column in the center of the simulation\n",
    "r0_dz, c0_dz, e0_dz = np.round(np.array(S['m']['dz'].shape)/2).astype(int)\n",
    "r0_z0, c0_z0 = np.round(np.array(S['m']['z0'].shape)/2).astype(int)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig=plt.figure(11)\n",
    "ax=fig.add_subplot(221)\n",
    "# plot the z0 for the model\n",
    "hh=plt.plot(S['m']['z0'].x, S['m']['z0'].z0[r0,:], label='z0')\n",
    "# plot the errors in z0\n",
    "ax.plot(S['m']['z0'].x, S['m']['z0'].z0[r0_z0,:]+S['E']['z0'].z0[r0_z0,:], '--', color=hh[0].get_color())\n",
    "ax.plot(S['m']['z0'].x, S['m']['z0'].z0[r0_z0,:]-S['E']['z0'].z0[r0_z0,:], '--', color=hh[0].get_color())\n",
    "ax.set_ylabel('z0')  \n",
    "    \n",
    "ax=fig.add_subplot(222)\n",
    "ax.plot(S['m']['z0'].x, S['E']['z0'].z0[r0_z0,:])\n",
    "ax.set_ylabel('$\\sigma_{z0}$')\n",
    "    \n",
    "ax=fig.add_subplot(223)\n",
    "hh=ax.plot(S['m']['dz'].x, S['m']['dz'].dz[r0_dz, :, 0])\n",
    "hh=ax.plot(S['m']['dz'].x, S['m']['dz'].dz[r0_dz, :, 0]- S['E']['dz'].dz[r0_dz, :, 0],'--', color=hh[0].get_color() )\n",
    "hh=ax.plot(S['m']['dz'].x, S['m']['dz'].dz[r0_dz, :, 0]+ S['E']['dz'].dz[r0_dz, :, 0],'--', color=hh[0].get_color() )\n",
    "ax.set_ylabel('dz, t=-1')\n",
    "\n",
    "ax=fig.add_subplot(224)\n",
    "ax.plot(S['m']['dz'].x, S['E']['dz'].dz[r0_z0,:, 0])\n",
    "ax.set_ylabel('$\\sigma_{dz, t=-1}$')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the error estimates are small where data are present, then increase across the data gap. Errors are also larger towards the edges of the grid, because farther from the center, fewer datapoints are available to constrain each grid point.\n",
    "\n",
    "This solution took a long time (and a lot of memory) to calculate, beacause the grids had fine resolution. Since we probably don't care if our error estimates have high or low resolution, we can speed up the calculation substantially by degrading the resolution of the error solution, then interpolating back to the original grid resolution.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reduced-resolution fit:\n",
    "SE = smooth_xytb_fit(data=data_dt_gap, ctr=ctr, W=W1, spacing={'z0':100, 'dz':100, 'dt':0.25}, E_RMS=E_RMS,\n",
    "                     reference_epoch=4, N_subset=None, compute_E=True,\n",
    "                     max_iterations=1, dzdt_lags=[1])\n",
    "\n",
    "# reduced-resolution sigma estimate, interpolated to full resolution\n",
    "sigma_i = SE['E']['z0'].interp(S['m']['z0'].x, S['m']['z0'].y, field='z0', gridded=True)\n",
    "\n",
    "plt.figure(12)\n",
    "plt.plot(S['m']['z0'].x, sigma_0.z0[r0_z0,:], label='full-res')\n",
    "plt.plot(S['m']['z0'].x,  sigma_i[r0_z0,:], label='half-res')\n",
    "plt.set_ylabel('$\\sigma_{z0}')\n",
    "plt.legend();\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reduced-resolution sigma estimate is very close to the full-resolution estimate, but takes very much less time to calculate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantities derived in error propagation\n",
    "\n",
    "The errors in derived quantities cannot be calculated without an estimate of the covariance structure of the fit model.  Since we are not including the covariance matrix in the output products (because it's too large to even calculate in most cases!) we pre-calculate error estimates for some derived quantities:\n",
    "\n",
    "* _dz\\_dt\\_lag\\_1_ : dz/dt from epoch to epoch\n",
    "* _dz\\_dt\\_lag\\_4_ : dz/dt calculated on an annual basis\n",
    "* _dz\\_bar_ : the error in the central 50% of the grid (typically 40x40 km)\n",
    "* _dz\\_dt\\_bar\\_lag\\_1_ , _dz\\_dt\\_bar\\_lag\\_4_ , dz/dt averaged over the central portion of the grid, at quarter-annual and annual resolution\n",
    "\n",
    "For the central-average quantities, we also provide an value for the area of the central estimate (including the effects of projection distortion and the fraction of the central area that is included in the ice mask). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "IS2",
   "language": "python",
   "name": "is2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
